Schlager, Nina, Karsten Donnay, Hyunjung Kim and Ravi Bhavnani (2023). Drivers of COVID-19 Protest Across Localities in Israel: A Machine-Learning Approach, Political Research Exchange, http://dx.doi.org/10.1080/2474736X.2023.2257368.

Data availability statement:
This repository provides the data used in the referred article. The data are provided, wherever applicable in final processed form, original sources for (raw) data are provided in this documentation. 

Index of data provided:

1. Data construction
scripts/helper_packages_load.R provides a list of packages. 
scripts/helper_functions.R summarises some custom functions needed to run the analysis.

All raw data sources are available in tabular form via de /data sub-directory. 
	⁃	raw_acled.non_covid.csv includes non-Covid protest events in Israel between March and July 2022 sourced from ACLED.
	⁃	raw_acled.covid.xlsx includes Covid-19 protest events in Israel between March and July 2022 sourced from the curated ACLED’s Covid-19 Instability Tracker dataset.
	⁃	raw_covar.fixed.csv includes fixed information on municipality-level characteristics sourced from the Israeli Central Bureau of Statistics.
	⁃	raw_covar.covid19.csv includes weekly information on Covid-19 dynamics at the local government-level.
	⁃	raw_covar.oxford.xlsx includes 4 indices measuring state-level government policy responses to the pandemic sourced from the Oxford Covid-19 Government Response Tracker (OxCGRT).
	⁃	Pop_Sex_Age_Relig_Setl.shp in the sub-directory data/shp_cbs includes spatial information to plot Figure 1. 
	⁃	covar.covid19-figure.csv features daily information on Covid-19 cases and hospitalisations at the local government-level to plot Figure 2. 

List of files:
	raw_acled.non_covid.csv 
	raw_acled.covid.xlsx
	raw_covar.fixed.csv
	raw_covar.covid19.csv
	raw_covar.oxford.xlsx
	Pop_Sex_Age_Relig_Setl.shp
           covar.covid19-figure.csv
	helper_packages_load.R
	helper_functions.R

List of related figures:
	Figure 1. Protest counts per municipality between March 2020 and July 2022. Larger circles indicate more protests. 
	Figure 2. Number of daily Covid (pink) and non-Covid protests (red) in Israeli municipalities relative to the daily change rate of Covid cases (purple) and vaccinations (blue) between March 2020 and July 2022.
	Figure S1. Protest counts per type and municipality between March 2020 and July 2022.

2. Imputation and descriptives

3. Generalized linear models
To plot Figure S2, raw_mechanism.csv provides a summary of variable names and corresponding mechanisms for each of the 60 covariates used in this study. 
The data is available in tabular form (.csv) via de /data sub-directory. 

List of files:
	raw_mechanism.csv 
	helper_packages_load.R
	helper_functions.R

List of related figures:
	Figure S2. Top-20 important drivers of Covid and non-Covid protests based on the minimal Poisson regression model. 

4. Random forest models
Computing time for the random forest models is quite substantial. 
While it is possible to replicate the entire analysis with the code provided, we provide pre-run results for all random forest models in the /results sub-directory.

The /results sub-directory features the following pre-run performance statistics for random forest models as R-data frames (.Rda), per protest type:
	⁃	performance_rf_covid.count_full.Rda includes performance statistics for the full random forest and Covid protest.
	⁃	performance_rf_non_covid.count_full.Rda includes performance statistics for the full random forest and non-Covid protest.
	⁃	performance_rf_covid.count_min.Rda includes performance statistics for the minimal random forest and Covid protest.
	⁃	performance_rf_non_covid.count_min.Rda  includes performance statistics for the minimal random forest and non-Covid protest.
	⁃	performance_rf_covid.count.temp_full.Rda includes performance statistics for the full random forest and Covid protest based on the temporal split sample.
	⁃	performance_rf_non_covid.count.temp_full.Rda  includes performance statistics for the full random forest and non-Covid protest based on the temporal split sample.
	⁃	performance_rf_covid.count.temp_min.Rda  includes performance statistics for the minimal random forest and Covid protest based on the temporal split sample.
	⁃	performance_rf_non_covid.count.temp_min.Rda  includes performance statistics for the minimal random forest and non-Covid protest based on the temporal split sample.
	⁃	performance_rf_covid.bin_full.Rda includes performance statistics for the full random forest and Covid protest using a binary outcome variable.
	⁃	performance_rf_non_covid.bin_full.Rda includes performance statistics for the full random forest and non-Covid protest using a binary outcome variable.
	⁃	performance_rf_covid.bin_min.Rda includes performance statistics for the minimal random forest and Covid protest using a binary outcome variable.
	⁃	performance_rf_non_covid.bin_min.Rda includes performance statistics for the minimal random forest and non-Covid protest using a binary outcome variable.

The /results sub-directory also features pre-run variable importance lists in tabular form (.csv) for all random forest models and per protest type:
	⁃	top20.rf_covid.count_full.csv includes the top-20 drivers of Covid protest counts based on the full random forest model as used to plot the left side of Figure 3.
	⁃	top20.rf_non_covid.count_full.csv includes the top-20 drivers of non-Covid protest counts based on the full random forest model as used to plot the right side of Figure 3.
	⁃	top20.rf_covid.count.temp_full.csv includes the top-20 drivers of Covid protest counts based on the full random forest model and temporal split sample as used to plot the left side of Figure S9.
	⁃	top20.rf_non_covid.count.temp_full.csv includes the top-20 drivers of non-Covid protest counts based on the full random forest model and temporal split sample as used to plot the right side of Figure S9.
	⁃	top20.rf_covid.bin_full.csv includes the top-20 drivers of Covid protest (binary) based on the full random forest model as used to plot the left side of Figure S12.
	⁃	top20.rf_non_covid.bin_full.csv includes the top-20 drivers of non-Covid protest (binary) based on the full random forest model as used to plot the right side of Figure S12.

To plot Figures 3, S9 and S12, raw_mechanism.csv provides a summary of variable names and corresponding mechanisms for each of the 60 covariates used in this study. 
The data is available in tabular form (.csv) via de /data sub-directory. 

List of files:
	raw_mechanism.csv 
		performance_rf_covid.count_full.Rda
		performance_rf_non_covid.count_full.Rda
		performance_rf_covid.count_min.Rda
		performance_rf_non_covid.count_min.Rda
		performance_rf_covid.count.temp_full.Rda
		performance_rf_non_covid.count.temp_full.Rda
		performance_rf_covid.count.temp_min.Rda
		performance_rf_non_covid.count.temp_min.Rda
		performance_rf_covid.bin_full.Rda
		performance_rf_non_covid.bin_full.Rda
		performance_rf_covid.bin_min.Rda
		performance_rf_non_covid.bin_min.Rda
		top20.rf_covid.count_full.csv 
		top20.rf_non_covid.count_full.csv 
		top20.rf_covid.count.temp_full.csv 
		top20.rf_non_covid.count.temp_full.csv
		top20.rf_covid.bin_full.csv
		top20.rf_non_covid.bin_full.csv
	helper_packages_load.R
	helper_functions.R

List of related figures:
           Figure 3. Top-20 important drivers of Covid and non-Covid protests based on the full random forest regression model. 
	Figure S3. Cross-validation results for the full random forest model and Covid-19 protest (figures/rf_covid.count_full.pdf).
	Figure S4. Cross-validation results for the full random forest model and non-Covid protest (figures/rf_non_covid.count_full.pdf).
	Figure S5. Cross-validation results for the minimal random forest model and Covid-19 protest (figures/rf_covid.count_min.pdf).
	Figure S6. Cross-validation results for the minimal random forest model and non-Covid protest (figures/rf_non_covid.count_min.pdf).
	Figure S7. Temporal split sample: Cross-validation results for the minimal random forest model and Covid-19 protest (figures/rf_covid.count.temp_min.pdf).
	Figure S8. Temporal split sample: Cross-validation results for the minimal random forest model and non-Covid protest (figures/rf_non_covid.count.temp_min.pdf).
	Figure S9. Temporal split sample: Top-20 important drivers of Covid and non-Covid protests based on the full random forest regression model. 
	Figure S10. Binary outcome: Cross-validation results for the minimal random forest model and Covid-19 protest (figures/rf_covid.bin_min.pdf).
	Figure S11. Binary outcome: Cross-validation results for the minimal random forest model and non-Covid protest (figures/rf_non_covid.bin_min.pdf).
	Figure S12. Binary outcome: Top-20 important drivers of Covid and non-Covid protests based on the full random forest classification model. 

5. Cross-prediction
Pre-run GLM and random forest models as well as test datasets for both protest types are available in the /results sub-directory. 
The models are saved as R-data frames (.Rda), test data in tabular form (.csv).
	⁃	df.test.cross_rf_covid.csv includes Covid protest test data.
	⁃	df.test.cross_rf_non_covid.csv includes non-Covid protest test data.
	⁃	model_rf_covid.count_min.Rda includes the cross-validated minimal Covid protest random forest model.
	⁃	model_rf_non_covid.count_min.Rda includes the cross-validated minimal non-Covid protest random forest model.

The /results sub-directory features the following pre-run performance statistics as R-data frames (.Rda):
	⁃	performance_cross_glm_covid.count_full.Rda includes performance statistics for the GLM and Covid protest.
	⁃	performance_cross_glm_non_covid.count_full.Rda includes performance statistics for the GLM and non-Covid protest.
	⁃	performance_cross_rf_covid.count_full.Rda includes performance statistics for the random forest model and Covid protest.
	⁃	performance_cross_rf_non_covid.count_full.Rda includes performance statistics for the random forest model and non-Covid protest.

List of files:
	df.test.cross_rf_covid.csv
	df.test.cross_rf_non_covid.csv
	model_rf_covid.count_min.Rda
	model_rf_non_covid.count_min.Rda
		performance_cross_glm_covid.count_full.Rda 
		performance_cross_glm_non_covid.count_full.Rda
		performance_cross_rf_covid.count_full.Rda
		performance_cross_rf_non_covid.count_full.Rda
	helper_packages_load.R
	helper_functions.R

6. Alternative sample
The prepared dataset including information for all 255 Israeli local governments is provided in tabular form (255_protest.locality-week.csv), accessible via the /data sub-directory. 

The /results sub-directory features the following pre-run performance statistics for the full Poisson and random forest regressions as R-data frames (.Rda), per protest type:
	⁃	255_performance_glm_covid.count_full.Rda includes performance statistics for the full GLM and Covid protest.
	⁃	255_performance_glm_non_covid.count_full.Rda includes performance statistics for the full GLM and non-Covid protest.
	⁃	255_performance_rf_covid.count_full.Rda includes performance statistics for the full random forest and Covid protest.
	⁃	255_performance_rf_non_covid.count_full.Rda includes performance statistics for the full random forest and non-Covid protest.

The /results sub-directory also features pre-run variable importance lists in tabular form (.csv) for the full random forest models and per protest type to plot Figure S15:
	⁃	255_top20.rf_covid.count_full.csv includes the top-20 drivers of Covid protests to plot the left side of Figure S15.
	⁃	255_top20.rf_non_covid.count_full.csv includes the top-20 drivers of non-Covid protests to plot the right side of Figure S15.

To plot Figure S15, raw_mechanism.csv provides a summary of variable names and corresponding mechanisms for each of the 60 covariates used in this study. 
The data is available in tabular form (.csv) via de /data sub-directory. 

List of files:
	raw_mechanism.csv 
	255_protest.locality-week.csv
	255_performance_glm_covid.count_full.Rda
	255_performance_glm_non_covid.count_full.Rda
		255_performance_rf_covid.count_full.Rda
		255_performance_rf_non_covid.count_full.Rda
		255_top20.rf_covid.count_full.csv
		255_top20.rf_non_covid.count_full.csv 
	helper_packages_load.R
	helper_functions.R

List of related figures:
	Figure S13. Alternative sample: Cross-validation results for the minimal random forest model and Covid-19 protest (figures/255_rf_covid.count_min.pdf).
	Figure S14. Alternative sample: Cross-validation results for the minimal random forest model and non-Covid protest (figures/255_rf_non_covid.count_min.pdf).
	Figure S15. Alternative sample: Top-20 important drivers of Covid-19 and non-Covid protests based on the full random forest regression model.

